Overview

Dataset statistics

 Dataset ADataset B
Number of variables1212
Number of observations446446
Missing cells426432
Missing cells (%)8.0%8.1%
Duplicate rows00
Duplicate rows (%)0.0%0.0%
Total size in memory45.3 KiB45.3 KiB
Average record size in memory104.0 B104.0 B

Variable types

 Dataset ADataset B
Numeric55
Categorical44
Text33

Alerts

Dataset ADataset B
Sex is highly overall correlated with SurvivedSex is highly overall correlated with SurvivedHigh Correlation
Survived is highly overall correlated with SexSurvived is highly overall correlated with SexHigh Correlation
Age has 84 (18.8%) missing values Age has 88 (19.7%) missing values Missing
Cabin has 341 (76.5%) missing values Cabin has 342 (76.7%) missing values Missing
PassengerId has unique values PassengerId has unique values Unique
Name has unique values Name has unique values Unique
SibSp has 294 (65.9%) zeros SibSp has 306 (68.6%) zeros Zeros
Parch has 334 (74.9%) zeros Parch has 348 (78.0%) zeros Zeros
Fare has 6 (1.3%) zeros Fare has 9 (2.0%) zeros Zeros

Reproduction

 Dataset ADataset B
Analysis started2023-11-22 10:41:21.9936892023-11-22 10:41:26.822515
Analysis finished2023-11-22 10:41:26.8214412023-11-22 10:41:30.213952
Duration4.83 seconds3.39 seconds
Software versionydata-profiling v0.0.dev0ydata-profiling v0.0.dev0
Download configurationconfig.jsonconfig.json

Variables

PassengerId
Real number (ℝ)

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean457.05605455.97085
 Dataset ADataset B
Minimum11
Maximum891890
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-11-22T10:41:30.367361image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum11
5-th percentile40.548.75
Q1237230.25
median461.5458
Q3686.75681.75
95-th percentile846.75855.75
Maximum891890
Range890889
Interquartile range (IQR)449.75451.5

Descriptive statistics

 Dataset ADataset B
Standard deviation259.63879260.86626
Coefficient of variation (CV)0.568067710.5721117
Kurtosis-1.1973026-1.23997
Mean457.05605455.97085
Median Absolute Deviation (MAD)225.5226.5
Skewness-0.092729601-0.030955301
Sum203847203363
Variance67412.368051.206
MonotonicityNot monotonicNot monotonic
2023-11-22T10:41:30.589283image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
191 1
 
0.2%
527 1
 
0.2%
83 1
 
0.2%
256 1
 
0.2%
167 1
 
0.2%
428 1
 
0.2%
687 1
 
0.2%
610 1
 
0.2%
28 1
 
0.2%
327 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
106 1
 
0.2%
708 1
 
0.2%
689 1
 
0.2%
409 1
 
0.2%
171 1
 
0.2%
450 1
 
0.2%
269 1
 
0.2%
498 1
 
0.2%
878 1
 
0.2%
545 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
4 1
0.2%
6 1
0.2%
7 1
0.2%
8 1
0.2%
9 1
0.2%
12 1
0.2%
14 1
0.2%
15 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
3 1
0.2%
4 1
0.2%
5 1
0.2%
7 1
0.2%
8 1
0.2%
9 1
0.2%
14 1
0.2%
17 1
0.2%
20 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
3 1
0.2%
4 1
0.2%
5 1
0.2%
7 1
0.2%
8 1
0.2%
9 1
0.2%
14 1
0.2%
17 1
0.2%
20 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
4 1
0.2%
6 1
0.2%
7 1
0.2%
8 1
0.2%
9 1
0.2%
12 1
0.2%
14 1
0.2%
15 1
0.2%

Survived
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
0
287 
1
159 
0
275 
1
171 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters22
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row10
2nd row01
3rd row10
4th row01
5th row00

Common Values

ValueCountFrequency (%)
0 287
64.3%
1 159
35.7%
ValueCountFrequency (%)
0 275
61.7%
1 171
38.3%

Length

2023-11-22T10:41:30.754571image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-11-22T10:41:30.874496image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-11-22T10:41:30.988957image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
0 287
64.3%
1 159
35.7%
ValueCountFrequency (%)
0 275
61.7%
1 171
38.3%

Most occurring characters

ValueCountFrequency (%)
0 287
64.3%
1 159
35.7%
ValueCountFrequency (%)
0 275
61.7%
1 171
38.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 446
100.0%
ValueCountFrequency (%)
Decimal Number 446
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 287
64.3%
1 159
35.7%
ValueCountFrequency (%)
0 275
61.7%
1 171
38.3%

Most occurring scripts

ValueCountFrequency (%)
Common 446
100.0%
ValueCountFrequency (%)
Common 446
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 287
64.3%
1 159
35.7%
ValueCountFrequency (%)
0 275
61.7%
1 171
38.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 446
100.0%
ValueCountFrequency (%)
ASCII 446
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 287
64.3%
1 159
35.7%
ValueCountFrequency (%)
0 275
61.7%
1 171
38.3%

Pclass
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
3
254 
1
108 
2
84 
3
245 
1
111 
2
90 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row23
2nd row31
3rd row32
4th row23
5th row32

Common Values

ValueCountFrequency (%)
3 254
57.0%
1 108
24.2%
2 84
 
18.8%
ValueCountFrequency (%)
3 245
54.9%
1 111
24.9%
2 90
 
20.2%

Length

2023-11-22T10:41:31.111534image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-11-22T10:41:31.233460image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-11-22T10:41:31.357140image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
3 254
57.0%
1 108
24.2%
2 84
 
18.8%
ValueCountFrequency (%)
3 245
54.9%
1 111
24.9%
2 90
 
20.2%

Most occurring characters

ValueCountFrequency (%)
3 254
57.0%
1 108
24.2%
2 84
 
18.8%
ValueCountFrequency (%)
3 245
54.9%
1 111
24.9%
2 90
 
20.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 446
100.0%
ValueCountFrequency (%)
Decimal Number 446
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 254
57.0%
1 108
24.2%
2 84
 
18.8%
ValueCountFrequency (%)
3 245
54.9%
1 111
24.9%
2 90
 
20.2%

Most occurring scripts

ValueCountFrequency (%)
Common 446
100.0%
ValueCountFrequency (%)
Common 446
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3 254
57.0%
1 108
24.2%
2 84
 
18.8%
ValueCountFrequency (%)
3 245
54.9%
1 111
24.9%
2 90
 
20.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 446
100.0%
ValueCountFrequency (%)
ASCII 446
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 254
57.0%
1 108
24.2%
2 84
 
18.8%
ValueCountFrequency (%)
3 245
54.9%
1 111
24.9%
2 90
 
20.2%

Name
['Text', 'Text']

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-11-22T10:41:31.748719image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

 Dataset ADataset B
Max length8282
Median length5050
Mean length26.86547126.710762
Min length1212

Characters and Unicode

 Dataset ADataset B
Total characters1198211913
Distinct characters5960
Distinct categories77 ?
Distinct scripts22 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique446446 ?
Unique (%)100.0%100.0%

Sample

 Dataset ADataset B
1st rowPinsky, Mrs. (Rosa)Mionoff, Mr. Stoytcho
2nd rowJensen, Mr. Svend LauritzBradley, Mr. George ("George Arthur Brayton")
3rd rowBaclini, Miss. Marie CatherineGill, Mr. John William
4th rowChapman, Mr. Charles Henryde Mulder, Mr. Theodore
5th rowNaidenoff, Mr. PenkoMitchell, Mr. Henry Michael
ValueCountFrequency (%)
mr 269
 
14.8%
miss 82
 
4.5%
mrs 70
 
3.8%
john 27
 
1.5%
william 24
 
1.3%
henry 18
 
1.0%
master 16
 
0.9%
thomas 12
 
0.7%
edward 12
 
0.7%
elizabeth 11
 
0.6%
Other values (896) 1280
70.3%
ValueCountFrequency (%)
mr 275
 
15.2%
miss 76
 
4.2%
mrs 67
 
3.7%
william 30
 
1.7%
john 21
 
1.2%
henry 18
 
1.0%
master 17
 
0.9%
edward 14
 
0.8%
thomas 12
 
0.7%
james 11
 
0.6%
Other values (907) 1269
70.1%
2023-11-22T10:41:32.447266image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1376
 
11.5%
r 951
 
7.9%
a 833
 
7.0%
e 829
 
6.9%
n 673
 
5.6%
s 637
 
5.3%
i 623
 
5.2%
M 565
 
4.7%
l 522
 
4.4%
o 521
 
4.3%
Other values (49) 4452
37.2%
ValueCountFrequency (%)
1365
 
11.5%
r 996
 
8.4%
e 850
 
7.1%
a 818
 
6.9%
n 646
 
5.4%
i 644
 
5.4%
s 629
 
5.3%
M 542
 
4.5%
l 536
 
4.5%
o 483
 
4.1%
Other values (50) 4404
37.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 7676
64.1%
Uppercase Letter 1828
 
15.3%
Space Separator 1376
 
11.5%
Other Punctuation 939
 
7.8%
Close Punctuation 78
 
0.7%
Open Punctuation 78
 
0.7%
Dash Punctuation 7
 
0.1%
ValueCountFrequency (%)
Lowercase Letter 7620
64.0%
Uppercase Letter 1818
 
15.3%
Space Separator 1365
 
11.5%
Other Punctuation 950
 
8.0%
Close Punctuation 77
 
0.6%
Open Punctuation 77
 
0.6%
Dash Punctuation 6
 
0.1%

Most frequent character per category

Space Separator
ValueCountFrequency (%)
1376
100.0%
ValueCountFrequency (%)
1365
100.0%
Lowercase Letter
ValueCountFrequency (%)
r 951
12.4%
a 833
10.9%
e 829
10.8%
n 673
8.8%
s 637
8.3%
i 623
8.1%
l 522
 
6.8%
o 521
 
6.8%
t 344
 
4.5%
h 260
 
3.4%
Other values (16) 1483
19.3%
ValueCountFrequency (%)
r 996
13.1%
e 850
11.2%
a 818
10.7%
n 646
8.5%
i 644
8.5%
s 629
8.3%
l 536
 
7.0%
o 483
 
6.3%
t 304
 
4.0%
d 265
 
3.5%
Other values (16) 1449
19.0%
Uppercase Letter
ValueCountFrequency (%)
M 565
30.9%
A 127
 
6.9%
J 116
 
6.3%
H 92
 
5.0%
S 90
 
4.9%
E 83
 
4.5%
B 82
 
4.5%
C 71
 
3.9%
W 67
 
3.7%
L 62
 
3.4%
Other values (15) 473
25.9%
ValueCountFrequency (%)
M 542
29.8%
A 129
 
7.1%
J 111
 
6.1%
H 95
 
5.2%
C 92
 
5.1%
E 89
 
4.9%
S 79
 
4.3%
L 78
 
4.3%
B 72
 
4.0%
W 65
 
3.6%
Other values (15) 466
25.6%
Other Punctuation
ValueCountFrequency (%)
. 447
47.6%
, 446
47.5%
" 42
 
4.5%
' 4
 
0.4%
ValueCountFrequency (%)
. 446
46.9%
, 446
46.9%
" 54
 
5.7%
' 3
 
0.3%
/ 1
 
0.1%
Close Punctuation
ValueCountFrequency (%)
) 78
100.0%
ValueCountFrequency (%)
) 77
100.0%
Open Punctuation
ValueCountFrequency (%)
( 78
100.0%
ValueCountFrequency (%)
( 77
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 7
100.0%
ValueCountFrequency (%)
- 6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 9504
79.3%
Common 2478
 
20.7%
ValueCountFrequency (%)
Latin 9438
79.2%
Common 2475
 
20.8%

Most frequent character per script

Common
ValueCountFrequency (%)
1376
55.5%
. 447
 
18.0%
, 446
 
18.0%
) 78
 
3.1%
( 78
 
3.1%
" 42
 
1.7%
- 7
 
0.3%
' 4
 
0.2%
ValueCountFrequency (%)
1365
55.2%
. 446
 
18.0%
, 446
 
18.0%
) 77
 
3.1%
( 77
 
3.1%
" 54
 
2.2%
- 6
 
0.2%
' 3
 
0.1%
/ 1
 
< 0.1%
Latin
ValueCountFrequency (%)
r 951
 
10.0%
a 833
 
8.8%
e 829
 
8.7%
n 673
 
7.1%
s 637
 
6.7%
i 623
 
6.6%
M 565
 
5.9%
l 522
 
5.5%
o 521
 
5.5%
t 344
 
3.6%
Other values (41) 3006
31.6%
ValueCountFrequency (%)
r 996
 
10.6%
e 850
 
9.0%
a 818
 
8.7%
n 646
 
6.8%
i 644
 
6.8%
s 629
 
6.7%
M 542
 
5.7%
l 536
 
5.7%
o 483
 
5.1%
t 304
 
3.2%
Other values (41) 2990
31.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 11982
100.0%
ValueCountFrequency (%)
ASCII 11913
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1376
 
11.5%
r 951
 
7.9%
a 833
 
7.0%
e 829
 
6.9%
n 673
 
5.6%
s 637
 
5.3%
i 623
 
5.2%
M 565
 
4.7%
l 522
 
4.4%
o 521
 
4.3%
Other values (49) 4452
37.2%
ValueCountFrequency (%)
1365
 
11.5%
r 996
 
8.4%
e 850
 
7.1%
a 818
 
6.9%
n 646
 
5.4%
i 644
 
5.4%
s 629
 
5.3%
M 542
 
4.5%
l 536
 
4.5%
o 483
 
4.1%
Other values (50) 4404
37.0%

Sex
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
male
294 
female
152 
male
302 
female
144 

Length

 Dataset ADataset B
Max length66
Median length44
Mean length4.68161434.6457399
Min length44

Characters and Unicode

 Dataset ADataset B
Total characters20882072
Distinct characters55
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowfemalemale
2nd rowmalemale
3rd rowfemalemale
4th rowmalemale
5th rowmalemale

Common Values

ValueCountFrequency (%)
male 294
65.9%
female 152
34.1%
ValueCountFrequency (%)
male 302
67.7%
female 144
32.3%

Length

2023-11-22T10:41:32.643836image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-11-22T10:41:32.956934image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-11-22T10:41:33.106661image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
male 294
65.9%
female 152
34.1%
ValueCountFrequency (%)
male 302
67.7%
female 144
32.3%

Most occurring characters

ValueCountFrequency (%)
e 598
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 152
 
7.3%
ValueCountFrequency (%)
e 590
28.5%
m 446
21.5%
a 446
21.5%
l 446
21.5%
f 144
 
6.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2088
100.0%
ValueCountFrequency (%)
Lowercase Letter 2072
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 598
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 152
 
7.3%
ValueCountFrequency (%)
e 590
28.5%
m 446
21.5%
a 446
21.5%
l 446
21.5%
f 144
 
6.9%

Most occurring scripts

ValueCountFrequency (%)
Latin 2088
100.0%
ValueCountFrequency (%)
Latin 2072
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 598
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 152
 
7.3%
ValueCountFrequency (%)
e 590
28.5%
m 446
21.5%
a 446
21.5%
l 446
21.5%
f 144
 
6.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2088
100.0%
ValueCountFrequency (%)
ASCII 2072
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 598
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 152
 
7.3%
ValueCountFrequency (%)
e 590
28.5%
m 446
21.5%
a 446
21.5%
l 446
21.5%
f 144
 
6.9%

Age
Real number (ℝ)

 Dataset ADataset B
Distinct7679
Distinct (%)21.0%22.1%
Missing8488
Missing (%)18.8%19.7%
Infinite00
Infinite (%)0.0%0.0%
Mean30.07207230.973715
 Dataset ADataset B
Minimum0.420.42
Maximum8071
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-11-22T10:41:33.342916image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum0.420.42
5-th percentile6.054.85
Q12021
median2930
Q33840
95-th percentile55.9559.15
Maximum8071
Range79.5870.58
Interquartile range (IQR)1819

Descriptive statistics

 Dataset ADataset B
Standard deviation14.35012414.604977
Coefficient of variation (CV)0.477191060.47152809
Kurtosis0.210405170.057642725
Mean30.07207230.973715
Median Absolute Deviation (MAD)99
Skewness0.426127850.33707854
Sum10886.0911088.59
Variance205.92605213.30534
MonotonicityNot monotonicNot monotonic
2023-11-22T10:41:33.652130image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
18 17
 
3.8%
19 15
 
3.4%
24 14
 
3.1%
22 13
 
2.9%
25 12
 
2.7%
21 11
 
2.5%
31 11
 
2.5%
27 11
 
2.5%
29 11
 
2.5%
33 10
 
2.2%
Other values (66) 237
53.1%
(Missing) 84
 
18.8%
ValueCountFrequency (%)
36 15
 
3.4%
18 14
 
3.1%
24 13
 
2.9%
21 12
 
2.7%
30 12
 
2.7%
27 12
 
2.7%
22 12
 
2.7%
19 11
 
2.5%
31 11
 
2.5%
25 10
 
2.2%
Other values (69) 236
52.9%
(Missing) 88
 
19.7%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.75 1
 
0.2%
0.92 1
 
0.2%
1 3
0.7%
2 6
1.3%
3 1
 
0.2%
4 2
 
0.4%
5 2
 
0.4%
6 2
 
0.4%
7 1
 
0.2%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.75 1
 
0.2%
0.92 1
 
0.2%
1 3
0.7%
2 4
0.9%
3 3
0.7%
4 5
1.1%
5 1
 
0.2%
6 1
 
0.2%
7 3
0.7%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.75 1
 
0.2%
0.92 1
 
0.2%
1 3
0.7%
2 4
0.9%
3 3
0.7%
4 5
1.1%
5 1
 
0.2%
6 1
 
0.2%
7 3
0.7%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.75 1
 
0.2%
0.92 1
 
0.2%
1 3
0.7%
2 6
1.3%
3 1
 
0.2%
4 2
 
0.4%
5 2
 
0.4%
6 2
 
0.4%
7 1
 
0.2%

SibSp
Real number (ℝ)

 Dataset ADataset B
Distinct77
Distinct (%)1.6%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.576233180.48878924
 Dataset ADataset B
Minimum00
Maximum88
Zeros294306
Zeros (%)65.9%68.6%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-11-22T10:41:33.879649image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q311
95-th percentile32
Maximum88
Range88
Interquartile range (IQR)11

Descriptive statistics

 Dataset ADataset B
Standard deviation1.13465140.99825009
Coefficient of variation (CV)1.96908382.0422915
Kurtosis14.11995217.217133
Mean0.576233180.48878924
Median Absolute Deviation (MAD)00
Skewness3.2779493.5165073
Sum257218
Variance1.28743390.99650325
MonotonicityNot monotonicNot monotonic
2023-11-22T10:41:34.066097image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 294
65.9%
1 107
 
24.0%
2 18
 
4.0%
4 10
 
2.2%
3 10
 
2.2%
5 4
 
0.9%
8 3
 
0.7%
ValueCountFrequency (%)
0 306
68.6%
1 106
 
23.8%
2 13
 
2.9%
4 11
 
2.5%
3 7
 
1.6%
8 2
 
0.4%
5 1
 
0.2%
ValueCountFrequency (%)
0 294
65.9%
1 107
 
24.0%
2 18
 
4.0%
3 10
 
2.2%
4 10
 
2.2%
5 4
 
0.9%
8 3
 
0.7%
ValueCountFrequency (%)
0 306
68.6%
1 106
 
23.8%
2 13
 
2.9%
3 7
 
1.6%
4 11
 
2.5%
5 1
 
0.2%
8 2
 
0.4%
ValueCountFrequency (%)
0 306
68.6%
1 106
 
23.8%
2 13
 
2.9%
3 7
 
1.6%
4 11
 
2.5%
5 1
 
0.2%
8 2
 
0.4%
ValueCountFrequency (%)
0 294
65.9%
1 107
 
24.0%
2 18
 
4.0%
3 10
 
2.2%
4 10
 
2.2%
5 4
 
0.9%
8 3
 
0.7%

Parch
Real number (ℝ)

 Dataset ADataset B
Distinct66
Distinct (%)1.3%1.3%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.378923770.35426009
 Dataset ADataset B
Minimum00
Maximum55
Zeros334348
Zeros (%)74.9%78.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-11-22T10:41:34.245726image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q30.750
95-th percentile22
Maximum55
Range55
Interquartile range (IQR)0.750

Descriptive statistics

 Dataset ADataset B
Standard deviation0.759896330.79609568
Coefficient of variation (CV)2.00540692.2472068
Kurtosis7.581307310.202235
Mean0.378923770.35426009
Median Absolute Deviation (MAD)00
Skewness2.45192942.9011138
Sum169158
Variance0.577442430.63376833
MonotonicityNot monotonicNot monotonic
2023-11-22T10:41:34.427382image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
0 334
74.9%
1 67
 
15.0%
2 38
 
8.5%
3 4
 
0.9%
5 2
 
0.4%
4 1
 
0.2%
ValueCountFrequency (%)
0 348
78.0%
1 56
 
12.6%
2 33
 
7.4%
3 3
 
0.7%
5 3
 
0.7%
4 3
 
0.7%
ValueCountFrequency (%)
0 334
74.9%
1 67
 
15.0%
2 38
 
8.5%
3 4
 
0.9%
4 1
 
0.2%
5 2
 
0.4%
ValueCountFrequency (%)
0 348
78.0%
1 56
 
12.6%
2 33
 
7.4%
3 3
 
0.7%
4 3
 
0.7%
5 3
 
0.7%
ValueCountFrequency (%)
0 348
78.0%
1 56
 
12.6%
2 33
 
7.4%
3 3
 
0.7%
4 3
 
0.7%
5 3
 
0.7%
ValueCountFrequency (%)
0 334
74.9%
1 67
 
15.0%
2 38
 
8.5%
3 4
 
0.9%
4 1
 
0.2%
5 2
 
0.4%

Ticket
['Text', 'Text']

 Dataset ADataset B
Distinct382385
Distinct (%)85.7%86.3%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-11-22T10:41:34.962977image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1818
Median length1717
Mean length6.71748886.7600897
Min length43

Characters and Unicode

 Dataset ADataset B
Total characters29963015
Distinct characters3532
Distinct categories55 ?
Distinct scripts22 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique331336 ?
Unique (%)74.2%75.3%

Sample

 Dataset ADataset B
1st row234604349207
2nd row350048111427
3rd row2666233866
4th row248731345774
5th row349206C.A. 24580
ValueCountFrequency (%)
pc 27
 
4.8%
c.a 16
 
2.8%
a/5 10
 
1.8%
ca 7
 
1.2%
2 6
 
1.1%
ston/o 6
 
1.1%
soton/o.q 4
 
0.7%
f.c.c 4
 
0.7%
w./c 4
 
0.7%
a/4 4
 
0.7%
Other values (399) 475
84.4%
ValueCountFrequency (%)
pc 30
 
5.3%
c.a 17
 
3.0%
a/5 9
 
1.6%
sc/paris 7
 
1.2%
347082 6
 
1.1%
2 6
 
1.1%
ston/o 6
 
1.1%
soton/oq 5
 
0.9%
1601 4
 
0.7%
17474 3
 
0.5%
Other values (406) 469
83.5%
2023-11-22T10:41:35.796118image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 365
12.2%
1 342
11.4%
2 312
10.4%
7 243
8.1%
4 239
8.0%
6 201
 
6.7%
5 194
 
6.5%
0 191
 
6.4%
9 185
 
6.2%
8 139
 
4.6%
Other values (25) 585
19.5%
ValueCountFrequency (%)
1 373
12.4%
3 372
12.3%
2 296
9.8%
7 245
8.1%
4 217
 
7.2%
6 203
 
6.7%
5 202
 
6.7%
0 195
 
6.5%
9 161
 
5.3%
8 142
 
4.7%
Other values (22) 609
20.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 2411
80.5%
Uppercase Letter 314
 
10.5%
Other Punctuation 145
 
4.8%
Space Separator 117
 
3.9%
Lowercase Letter 9
 
0.3%
ValueCountFrequency (%)
Decimal Number 2406
79.8%
Uppercase Letter 338
 
11.2%
Other Punctuation 142
 
4.7%
Space Separator 116
 
3.8%
Lowercase Letter 13
 
0.4%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 365
15.1%
1 342
14.2%
2 312
12.9%
7 243
10.1%
4 239
9.9%
6 201
8.3%
5 194
8.0%
0 191
7.9%
9 185
7.7%
8 139
 
5.8%
ValueCountFrequency (%)
1 373
15.5%
3 372
15.5%
2 296
12.3%
7 245
10.2%
4 217
9.0%
6 203
8.4%
5 202
8.4%
0 195
8.1%
9 161
6.7%
8 142
 
5.9%
Space Separator
ValueCountFrequency (%)
117
100.0%
ValueCountFrequency (%)
116
100.0%
Other Punctuation
ValueCountFrequency (%)
. 97
66.9%
/ 48
33.1%
ValueCountFrequency (%)
. 91
64.1%
/ 51
35.9%
Uppercase Letter
ValueCountFrequency (%)
C 75
23.9%
O 49
15.6%
A 44
14.0%
P 43
13.7%
S 34
10.8%
N 18
 
5.7%
T 17
 
5.4%
Q 8
 
2.5%
W 7
 
2.2%
F 6
 
1.9%
Other values (6) 13
 
4.1%
ValueCountFrequency (%)
C 73
21.6%
O 53
15.7%
P 50
14.8%
A 41
12.1%
S 41
12.1%
N 22
 
6.5%
T 19
 
5.6%
Q 8
 
2.4%
I 8
 
2.4%
W 6
 
1.8%
Other values (5) 17
 
5.0%
Lowercase Letter
ValueCountFrequency (%)
a 3
33.3%
s 2
22.2%
r 1
 
11.1%
i 1
 
11.1%
l 1
 
11.1%
e 1
 
11.1%
ValueCountFrequency (%)
a 4
30.8%
r 3
23.1%
i 3
23.1%
s 3
23.1%

Most occurring scripts

ValueCountFrequency (%)
Common 2673
89.2%
Latin 323
 
10.8%
ValueCountFrequency (%)
Common 2664
88.4%
Latin 351
 
11.6%

Most frequent character per script

Common
ValueCountFrequency (%)
3 365
13.7%
1 342
12.8%
2 312
11.7%
7 243
9.1%
4 239
8.9%
6 201
7.5%
5 194
7.3%
0 191
7.1%
9 185
6.9%
8 139
 
5.2%
Other values (3) 262
9.8%
ValueCountFrequency (%)
1 373
14.0%
3 372
14.0%
2 296
11.1%
7 245
9.2%
4 217
8.1%
6 203
7.6%
5 202
7.6%
0 195
7.3%
9 161
6.0%
8 142
 
5.3%
Other values (3) 258
9.7%
Latin
ValueCountFrequency (%)
C 75
23.2%
O 49
15.2%
A 44
13.6%
P 43
13.3%
S 34
10.5%
N 18
 
5.6%
T 17
 
5.3%
Q 8
 
2.5%
W 7
 
2.2%
F 6
 
1.9%
Other values (12) 22
 
6.8%
ValueCountFrequency (%)
C 73
20.8%
O 53
15.1%
P 50
14.2%
A 41
11.7%
S 41
11.7%
N 22
 
6.3%
T 19
 
5.4%
Q 8
 
2.3%
I 8
 
2.3%
W 6
 
1.7%
Other values (9) 30
8.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2996
100.0%
ValueCountFrequency (%)
ASCII 3015
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 365
12.2%
1 342
11.4%
2 312
10.4%
7 243
8.1%
4 239
8.0%
6 201
 
6.7%
5 194
 
6.5%
0 191
 
6.4%
9 185
 
6.2%
8 139
 
4.6%
Other values (25) 585
19.5%
ValueCountFrequency (%)
1 373
12.4%
3 372
12.3%
2 296
9.8%
7 245
8.1%
4 217
 
7.2%
6 203
 
6.7%
5 202
 
6.7%
0 195
 
6.5%
9 161
 
5.3%
8 142
 
4.7%
Other values (22) 609
20.2%

Fare
Real number (ℝ)

 Dataset ADataset B
Distinct181179
Distinct (%)40.6%40.1%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean33.26972130.389648
 Dataset ADataset B
Minimum00
Maximum512.3292512.3292
Zeros69
Zeros (%)1.3%2.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-11-22T10:41:36.101532image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile7.1291757.0542
Q17.89587.8958
median14.4562513.93125
Q33130.5
95-th percentile112.67708106.425
Maximum512.3292512.3292
Range512.3292512.3292
Interquartile range (IQR)23.104222.6042

Descriptive statistics

 Dataset ADataset B
Standard deviation53.55962945.117134
Coefficient of variation (CV)1.60986111.4846218
Kurtosis32.27609535.222513
Mean33.26972130.389648
Median Absolute Deviation (MAD)6.933356.63125
Skewness4.82145964.7708925
Sum14838.29513553.783
Variance2868.63382035.5558
MonotonicityNot monotonicNot monotonic
2023-11-22T10:41:36.401295image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7.8958 27
 
6.1%
8.05 18
 
4.0%
13 16
 
3.6%
7.75 14
 
3.1%
26 14
 
3.1%
10.5 11
 
2.5%
8.6625 9
 
2.0%
7.925 9
 
2.0%
7.2292 9
 
2.0%
7.8542 7
 
1.6%
Other values (171) 312
70.0%
ValueCountFrequency (%)
13 22
 
4.9%
8.05 21
 
4.7%
7.8958 20
 
4.5%
10.5 14
 
3.1%
26 13
 
2.9%
7.75 12
 
2.7%
7.775 10
 
2.2%
7.925 10
 
2.2%
7.2292 9
 
2.0%
0 9
 
2.0%
Other values (169) 306
68.6%
ValueCountFrequency (%)
0 6
1.3%
6.2375 1
 
0.2%
6.4958 2
 
0.4%
6.75 1
 
0.2%
6.8583 1
 
0.2%
6.95 1
 
0.2%
6.975 1
 
0.2%
7.0458 1
 
0.2%
7.05 4
0.9%
7.0542 2
 
0.4%
ValueCountFrequency (%)
0 9
2.0%
5 1
 
0.2%
6.2375 1
 
0.2%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.4958 1
 
0.2%
6.75 2
 
0.4%
6.95 1
 
0.2%
6.975 1
 
0.2%
7.0458 1
 
0.2%
ValueCountFrequency (%)
0 9
2.0%
5 1
 
0.2%
6.2375 1
 
0.2%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.4958 1
 
0.2%
6.75 2
 
0.4%
6.95 1
 
0.2%
6.975 1
 
0.2%
7.0458 1
 
0.2%
ValueCountFrequency (%)
0 6
1.3%
6.2375 1
 
0.2%
6.4958 2
 
0.4%
6.75 1
 
0.2%
6.8583 1
 
0.2%
6.95 1
 
0.2%
6.975 1
 
0.2%
7.0458 1
 
0.2%
7.05 4
0.9%
7.0542 2
 
0.4%

Cabin
['Text', 'Text']

 Dataset ADataset B
Distinct8788
Distinct (%)82.9%84.6%
Missing341342
Missing (%)76.5%76.7%
Memory size7.0 KiB7.0 KiB
2023-11-22T10:41:36.913130image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1511
Median length33
Mean length3.76190483.5384615
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters395368
Distinct characters1918
Distinct categories33 ?
Distinct scripts22 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique7172 ?
Unique (%)67.6%69.2%

Sample

 Dataset ADataset B
1st rowE67B18
2nd rowE40C22 C26
3rd rowC92C78
4th rowF G63C83
5th rowD36B77
ValueCountFrequency (%)
c23 4
 
3.2%
c27 4
 
3.2%
c25 4
 
3.2%
c78 2
 
1.6%
c92 2
 
1.6%
c65 2
 
1.6%
b20 2
 
1.6%
e33 2
 
1.6%
c26 2
 
1.6%
c22 2
 
1.6%
Other values (87) 99
79.2%
ValueCountFrequency (%)
b28 2
 
1.7%
e8 2
 
1.7%
b96 2
 
1.7%
f33 2
 
1.7%
c83 2
 
1.7%
c78 2
 
1.7%
c26 2
 
1.7%
c22 2
 
1.7%
c68 2
 
1.7%
c65 2
 
1.7%
Other values (87) 97
82.9%
2023-11-22T10:41:37.652478image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
C 50
12.7%
2 47
11.9%
1 38
9.6%
3 32
 
8.1%
6 28
 
7.1%
B 26
 
6.6%
5 22
 
5.6%
20
 
5.1%
0 19
 
4.8%
D 19
 
4.8%
Other values (9) 94
23.8%
ValueCountFrequency (%)
C 43
11.7%
1 37
10.1%
2 36
9.8%
B 28
 
7.6%
3 26
 
7.1%
6 26
 
7.1%
5 25
 
6.8%
8 25
 
6.8%
4 18
 
4.9%
0 17
 
4.6%
Other values (8) 87
23.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 250
63.3%
Uppercase Letter 125
31.6%
Space Separator 20
 
5.1%
ValueCountFrequency (%)
Decimal Number 238
64.7%
Uppercase Letter 117
31.8%
Space Separator 13
 
3.5%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
C 50
40.0%
B 26
20.8%
D 19
 
15.2%
E 15
 
12.0%
A 7
 
5.6%
F 5
 
4.0%
G 2
 
1.6%
T 1
 
0.8%
ValueCountFrequency (%)
C 43
36.8%
B 28
23.9%
E 17
 
14.5%
D 16
 
13.7%
A 9
 
7.7%
F 3
 
2.6%
G 1
 
0.9%
Decimal Number
ValueCountFrequency (%)
2 47
18.8%
1 38
15.2%
3 32
12.8%
6 28
11.2%
5 22
8.8%
0 19
7.6%
8 19
7.6%
7 17
 
6.8%
4 17
 
6.8%
9 11
 
4.4%
ValueCountFrequency (%)
1 37
15.5%
2 36
15.1%
3 26
10.9%
6 26
10.9%
5 25
10.5%
8 25
10.5%
4 18
7.6%
0 17
7.1%
7 16
6.7%
9 12
 
5.0%
Space Separator
ValueCountFrequency (%)
20
100.0%
ValueCountFrequency (%)
13
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 270
68.4%
Latin 125
31.6%
ValueCountFrequency (%)
Common 251
68.2%
Latin 117
31.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
C 50
40.0%
B 26
20.8%
D 19
 
15.2%
E 15
 
12.0%
A 7
 
5.6%
F 5
 
4.0%
G 2
 
1.6%
T 1
 
0.8%
ValueCountFrequency (%)
C 43
36.8%
B 28
23.9%
E 17
 
14.5%
D 16
 
13.7%
A 9
 
7.7%
F 3
 
2.6%
G 1
 
0.9%
Common
ValueCountFrequency (%)
2 47
17.4%
1 38
14.1%
3 32
11.9%
6 28
10.4%
5 22
8.1%
20
7.4%
0 19
7.0%
8 19
7.0%
7 17
 
6.3%
4 17
 
6.3%
ValueCountFrequency (%)
1 37
14.7%
2 36
14.3%
3 26
10.4%
6 26
10.4%
5 25
10.0%
8 25
10.0%
4 18
7.2%
0 17
6.8%
7 16
6.4%
13
 
5.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 395
100.0%
ValueCountFrequency (%)
ASCII 368
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
C 50
12.7%
2 47
11.9%
1 38
9.6%
3 32
 
8.1%
6 28
 
7.1%
B 26
 
6.6%
5 22
 
5.6%
20
 
5.1%
0 19
 
4.8%
D 19
 
4.8%
Other values (9) 94
23.8%
ValueCountFrequency (%)
C 43
11.7%
1 37
10.1%
2 36
9.8%
B 28
 
7.6%
3 26
 
7.1%
6 26
 
7.1%
5 25
 
6.8%
8 25
 
6.8%
4 18
 
4.9%
0 17
 
4.6%
Other values (8) 87
23.6%

Embarked
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing12
Missing (%)0.2%0.4%
Memory size7.0 KiB7.0 KiB
S
322 
C
86 
Q
37 
S
324 
C
81 
Q
39 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters445444
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowSS
2nd rowSS
3rd rowCS
4th rowSS
5th rowSS

Common Values

ValueCountFrequency (%)
S 322
72.2%
C 86
 
19.3%
Q 37
 
8.3%
(Missing) 1
 
0.2%
ValueCountFrequency (%)
S 324
72.6%
C 81
 
18.2%
Q 39
 
8.7%
(Missing) 2
 
0.4%

Length

2023-11-22T10:41:37.893981image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-11-22T10:41:38.058452image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-11-22T10:41:38.220424image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
s 322
72.4%
c 86
 
19.3%
q 37
 
8.3%
ValueCountFrequency (%)
s 324
73.0%
c 81
 
18.2%
q 39
 
8.8%

Most occurring characters

ValueCountFrequency (%)
S 322
72.4%
C 86
 
19.3%
Q 37
 
8.3%
ValueCountFrequency (%)
S 324
73.0%
C 81
 
18.2%
Q 39
 
8.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 445
100.0%
ValueCountFrequency (%)
Uppercase Letter 444
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 322
72.4%
C 86
 
19.3%
Q 37
 
8.3%
ValueCountFrequency (%)
S 324
73.0%
C 81
 
18.2%
Q 39
 
8.8%

Most occurring scripts

ValueCountFrequency (%)
Latin 445
100.0%
ValueCountFrequency (%)
Latin 444
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 322
72.4%
C 86
 
19.3%
Q 37
 
8.3%
ValueCountFrequency (%)
S 324
73.0%
C 81
 
18.2%
Q 39
 
8.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 445
100.0%
ValueCountFrequency (%)
ASCII 444
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S 322
72.4%
C 86
 
19.3%
Q 37
 
8.3%
ValueCountFrequency (%)
S 324
73.0%
C 81
 
18.2%
Q 39
 
8.8%

Interactions

Dataset A

2023-11-22T10:41:25.527329image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-11-22T10:41:29.267995image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-11-22T10:41:22.646714image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-11-22T10:41:27.111827image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-11-22T10:41:23.416724image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-11-22T10:41:27.608268image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-11-22T10:41:24.124368image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-11-22T10:41:28.119632image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-11-22T10:41:24.842452image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-11-22T10:41:28.761200image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-11-22T10:41:25.652919image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-11-22T10:41:29.362973image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-11-22T10:41:22.777080image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-11-22T10:41:27.203813image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-11-22T10:41:23.551727image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-11-22T10:41:27.706341image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-11-22T10:41:24.259849image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-11-22T10:41:28.220364image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-11-22T10:41:24.971019image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-11-22T10:41:28.855489image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-11-22T10:41:25.800077image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-11-22T10:41:29.468998image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-11-22T10:41:22.926034image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-11-22T10:41:27.306403image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-11-22T10:41:23.706840image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-11-22T10:41:27.813586image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-11-22T10:41:24.406119image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-11-22T10:41:28.324991image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-11-22T10:41:25.115607image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-11-22T10:41:28.962236image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-11-22T10:41:25.947306image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-11-22T10:41:29.577127image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-11-22T10:41:23.076174image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-11-22T10:41:27.415102image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-11-22T10:41:23.845986image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-11-22T10:41:27.917558image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-11-22T10:41:24.561540image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-11-22T10:41:28.439346image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-11-22T10:41:25.264499image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-11-22T10:41:29.074469image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-11-22T10:41:26.081280image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-11-22T10:41:29.674786image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-11-22T10:41:23.286302image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-11-22T10:41:27.511753image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-11-22T10:41:23.985694image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-11-22T10:41:28.018620image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-11-22T10:41:24.702636image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-11-22T10:41:28.543843image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-11-22T10:41:25.394481image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-11-22T10:41:29.170715image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Correlations

Dataset A

2023-11-22T10:41:38.350843image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-11-22T10:41:38.541646image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

AgeEmbarkedFareParchPassengerIdPclassSexSibSpSurvived
Age1.0000.0000.151-0.2680.0440.2870.000-0.2000.125
Embarked0.0001.000-0.0820.054-0.0030.2620.0200.0300.172
Fare0.151-0.0821.0000.422-0.0310.4770.1820.4980.307
Parch-0.2680.0540.4221.000-0.0050.0410.3110.4840.154
PassengerId0.044-0.003-0.031-0.0051.0000.0720.000-0.1400.052
Pclass0.2870.2620.4770.0410.0721.0000.144-0.0860.347
Sex0.0000.0200.1820.3110.0000.1441.000-0.2230.555
SibSp-0.2000.0300.4980.484-0.140-0.086-0.2231.0000.148
Survived0.1250.1720.3070.1540.0520.3470.5550.1481.000

Dataset B

AgeEmbarkedFareParchPassengerIdPclassSexSibSpSurvived
Age1.0000.1010.193-0.1230.0690.2680.030-0.1220.100
Embarked0.1011.000-0.0670.0420.0170.2060.081-0.0020.132
Fare0.193-0.0671.0000.378-0.0400.4940.1900.4600.298
Parch-0.1230.0420.3781.000-0.1000.0200.3050.4320.132
PassengerId0.0690.017-0.040-0.1001.0000.0000.055-0.0850.126
Pclass0.2680.2060.4940.0200.0001.0000.163-0.0830.310
Sex0.0300.0810.1900.3050.0550.1631.000-0.2020.574
SibSp-0.122-0.0020.4600.432-0.085-0.083-0.2021.0000.198
Survived0.1000.1320.2980.1320.1260.3100.5740.1981.000

Missing values

Dataset A

2023-11-22T10:41:26.280100image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset B

2023-11-22T10:41:29.820493image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset A

2023-11-22T10:41:26.555546image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset B

2023-11-22T10:41:30.022728image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset A

2023-11-22T10:41:26.740057image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Dataset B

2023-11-22T10:41:30.153136image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
19019112Pinsky, Mrs. (Rosa)female32.00023460413.0000NaNS
72172203Jensen, Mr. Svend Lauritzmale17.0103500487.0542NaNS
44844913Baclini, Miss. Marie Catherinefemale5.021266619.2583NaNC
69569602Chapman, Mr. Charles Henrymale52.00024873113.5000NaNS
28728803Naidenoff, Mr. Penkomale22.0003492067.8958NaNS
72772813Mannion, Miss. MargarethfemaleNaN00368667.7375NaNQ
31731802Moraweck, Dr. Ernestmale54.0002901114.0000NaNS
41441513Sundman, Mr. Johan Julianmale44.000STON/O 2. 31012697.9250NaNS
55855911Taussig, Mrs. Emil (Tillie Mandelbaum)female39.01111041379.6500E67S
70670712Kelly, Mrs. Florence "Fannie"female45.00022359613.5000NaNS

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
10510603Mionoff, Mr. Stoytchomale28.0003492077.8958NaNS
50750811Bradley, Mr. George ("George Arthur Brayton")maleNaN0011142726.5500NaNS
86486502Gill, Mr. John Williammale24.00023386613.0000NaNS
28628713de Mulder, Mr. Theodoremale30.0003457749.5000NaNS
67267302Mitchell, Mr. Henry Michaelmale70.000C.A. 2458010.5000NaNS
15315403van Billiard, Mr. Austin Blylermale40.502A/5. 85114.5000NaNS
32933011Hippach, Miss. Jean Gertrudefemale16.00111136157.9792B18C
505103Panula, Master. Juha Niilomale7.041310129539.6875NaNS
86586612Bystrom, Mrs. (Karolina)female42.00023685213.0000NaNS
77477512Hocking, Mrs. Elizabeth (Eliza Needs)female54.0132910523.0000NaNS

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
43343403Kallio, Mr. Nikolai Erlandmale17.000STON/O 2. 31012747.1250NaNS
48248303Rouse, Mr. Richard Henrymale50.000A/5 35948.0500NaNS
80480513Hedman, Mr. Oskar Arvidmale27.0003470896.9750NaNS
59159211Stephenson, Mrs. Walter Bertram (Martha Eustis)female52.0103694778.2667D20C
38838903Sadlier, Mr. MatthewmaleNaN003676557.7292NaNQ
38238303Tikkanen, Mr. Juhomale32.000STON/O 2. 31012937.9250NaNS
61861912Becker, Miss. Marion Louisefemale4.02123013639.0000F4S
747513Bing, Mr. Leemale32.000160156.4958NaNS
52152203Vovk, Mr. Jankomale22.0003492527.8958NaNS
54454501Douglas, Mr. Walter Donaldmale50.010PC 17761106.4250C86C

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
84784803Markoff, Mr. Marinmale35.00003492137.8958NaNC
16316403Calic, Mr. Jovomale17.00003150938.6625NaNS
22122202Bracken, Mr. James Hmale27.000022036713.0000NaNS
80380413Thomas, Master. Assad Alexandermale0.420126258.5167NaNC
23723812Collyer, Miss. Marjorie "Lottie"female8.0002C.A. 3192126.2500NaNS
16416503Panula, Master. Eino Viljamimale1.0041310129539.6875NaNS
79980003Van Impe, Mrs. Jean Baptiste (Rosalie Paula Govaert)female30.001134577324.1500NaNS
18718811Romaine, Mr. Charles Hallace ("Mr C Rolmane")male45.000011142826.5500NaNS
9910002Kantor, Mr. Sinaimale34.001024436726.0000NaNS
68268303Olsvigen, Mr. Thor Andersonmale20.000065639.2250NaNS

Duplicate rows

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.